Introduction_ Data.md (2195B)
1 +++ 2 title = 'Introduction: Data' 3 template = 'page-math.html' 4 +++ 5 6 # Introduction: Data 7 8 statistics: the science of data - collecting, organising, analysing, interpreting, presenting 9 10 sample: a selected subcollection from the population 11 12 ## Collecting sample data 13 14 concepts: 15 16 - variables: 17 - independent: might cause the effect being studied 18 - dependent: represents the effect being studied 19 - confounding: when there’s too many variables and you have no clue wtf is causing the effect 20 21 sampling methods: 22 23 - voluntary response: subjects decide to be included 24 - random: each *member* from population has equal probability to be selected 25 - simple random: each *sample of size n* has equal probability to be selected 26 - systematic: after starting point, select every k-th member (based on a system) 27 - convenience: choose what’s convenient 28 - startified: split population into subgroups with same characteristics, simple random sample each group 29 - cluster: split population into clusters, then randomly select some of them 30 31 types of studies: 32 33 - observational study: subjects observed, not modified 34 - retrospective: data from past 35 - cross-sectional: data from one point in time 36 - prospective: data to be collected (future) 37 - experiment: some treatment applied to subjects 38 - sometimes control and treatment group 39 - gotta watch out for placebo and observer effects 40 41 ## Types of data 42 43 What to do with data? 44 45 - parameter: numerical measurement of *population* (in Greek: μ, σ, ...) 46 - statistic: numerical measurement of *sample* (in English: $\bar{x}$, s, ...) 47 48 data can be: 49 50 - qualitative: names or labels (strings) 51 - quantitative: numbers (ints, floats) 52 - discrete: countable 53 - continuous: not countable (on a continuous scale like length, weight, distance) 54 55 you have different levels of measurement: 56 57 - qualitative: 58 - nominal: no ordering (gender, eye color) 59 - ordinal: ordering, but differences between categories have no meaning (e.g. agree/disagree) 60 - quantitative: 61 - interval: ordering, differences, but no natural zero point (year of birth, temperatures in F/C) 62 - ratio: ordering, differences, natural zero point (body length, marathon times)